Vertical Set Square Distance Based Clustering without Prior Knowledge of K

نویسندگان

Amal Perera

Taufik Abidin

Masum Serazi

George Hamer

William Perrizo

چکیده

Clustering is automated identification of groups of objects based on similarity. In clustering two major research issues are scalability and the requirement of domain knowledge to determine input parameters. Most approaches suggest the use of sampling to address the issue of scalability. However, sampling does not guarantee the best solution and can cause significant loss in accuracy. Most approaches also require the use of domain knowledge, trial and error techniques, or exhaustive searching to figure out the required input parameters. In this paper we introduce a new clustering technique based on the set square distance. Cluster membership is determined based on the set squared distance to the respective cluster. As in the case of mean for k-means and median for k-medoids, the cluster is represented by the entire cluster of points for each evaluation of membership. The set square distance for all n items can be computed efficiently in O(n) using a vertical data structure and a few pre-computed values. Special ordering of the set square distance is used to break the data into the “natural” clusters compared to the need of a known k for k-means or kmedoids type of partition clustering. Superior results are observed when the new clustering technique is compared with the classical k-means clustering. To prove the cluster quality and the resolution of the unknown k, data sets with known classes such as the iris data, the uci_kdd network intrusion data, and synthetic data are used. The scalability of the proposed technique is proved using a large RSI data set.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Wised Semi-Supervised Cluster Ensemble Selection: A New Framework for Selecting and Combing Multiple Partitions Based on Prior knowledge

The Wisdom of Crowds, an innovative theory described in social science, claims that the aggregate decisions made by a group will often be better than those of its individual members if the four fundamental criteria of this theory are satisfied. This theory used for in clustering problems. Previous researches showed that this theory can significantly increase the stability and performance of...

متن کامل

New distance and similarity measures for hesitant fuzzy soft sets

The hesitant fuzzy soft set (HFSS), as a combination of hesitant fuzzy and soft sets, is regarded as a useful tool for dealing with the uncertainty and ambiguity of real-world problems. In HFSSs, each element is defined in terms of several parameters with arbitrary membership degrees. In addition, distance and similarity measures are considered as the important tools in different areas such as ...

متن کامل

An Optimization K-Modes Clustering Algorithm with Elephant Herding Optimization Algorithm for Crime Clustering

The detection and prevention of crime, in the past few decades, required several years of research and analysis. However, today, thanks to smart systems based on data mining techniques, it is possible to detect and prevent crime in a considerably less time. Classification and clustering-based smart techniques can classify and cluster the crime-related samples. The most important factor in the c...

متن کامل

Wised Semi-Supervised Cluster Ensemble Selection: A New Framework for Selecting and Combing Multiple Partitions Based on Prior knowledge

متن کامل

Clustering of nasopharyngeal carcinoma intensity modulated radiation therapy plans based on k-means algorithm and geometrical features

Background: The design of intensity modulated radiation therapy (IMRT) plans is difficult and time-consuming. The retrieval of similar IMRT plans from the IMRT plan dataset can effectively improve the quality and efficiency of IMRT plans and automate the design of IMRT planning. However, the large IMRT plans datasets will bring inefficient retrieval result. Materials and Methods: An intensity-m...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2005

Vertical Set Square Distance Based Clustering without Prior Knowledge of K

نویسندگان

چکیده

منابع مشابه

Wised Semi-Supervised Cluster Ensemble Selection: A New Framework for Selecting and Combing Multiple Partitions Based on Prior knowledge

New distance and similarity measures for hesitant fuzzy soft sets

An Optimization K-Modes Clustering Algorithm with Elephant Herding Optimization Algorithm for Crime Clustering

Wised Semi-Supervised Cluster Ensemble Selection: A New Framework for Selecting and Combing Multiple Partitions Based on Prior knowledge

Clustering of nasopharyngeal carcinoma intensity modulated radiation therapy plans based on k-means algorithm and geometrical features

عنوان ژورنال:

اشتراک گذاری